ICU-23350 Support name aliases of type correction in UnicodeSet \N#3918
ICU-23350 Support name aliases of type correction in UnicodeSet \N#3918eggrobin merged 1 commit intounicode-org:mainfrom
Conversation
|
@markusicu I thought this was covered by ICU-8963, but while that ticket mentions it in passing at the end, it is about something else. Shall I file a ticket? I think I could self-approve such a ticket, since the TC has approved the proposal https://groups.google.com/a/unicode.org/g/icu-design/c/Tx2123ejkTI/m/ZDLGzY-XDQAJ which included this (under Name aliases and UAX44-LM2 in \N and \p{Name=…}). |
Right. That one is about changing the data structure so that we can store multiple aliases, and adding API (constants) for selecting among them. Maybe new API altogether. And a little bit of UnicodeSet parser work to wire in API changes as needed.
Yes -- I can't find one for this.
+1 |
icu4c/source/common/uniset_props.cpp
Outdated
| for (const UCharNameChoice nameChoice : | ||
| std::array{U_EXTENDED_CHAR_NAME, U_CHAR_NAME_ALIAS}) { | ||
| ec = U_ZERO_ERROR; | ||
| UChar32 ch = u_charFromName(nameChoice, buf, &ec); |
There was a problem hiding this comment.
FYI: The feature is good, but it will make this even slower :-(
We should really add an API that we ask to consider all aliases, or some set of several aliases. (Different ticket & PR)
icu4c/source/common/uniset_props.cpp
Outdated
| const UChar32 result = getCharacterByName( | ||
| std::u16string_view(pattern_).substr(start, parsePosition_.getIndex() - 1 - start)); | ||
| if (result == U_SENTINEL || (hex.has_value() && result != hex) || | ||
| (literal.has_value() && result != literal)) { |
There was a problem hiding this comment.
please indent more than the if-body
There was a problem hiding this comment.
Done. We should probably figure out how to tell clang-format to do that, it indents where the ( is, and if ( is the same length as our indent…
There was a problem hiding this comment.
i don't know why it does that; in google3 (where the indent is 2 spaces), the effect is that the continuation is indented more than the body; it should make that work with a larger indent as well
There was a problem hiding this comment.
Well, it does what it does because that works for google3 and other major users.
It seems that AlignAfterOpenBracket: DontAlign can work, but it is a very blunt hammer: https://discourse.llvm.org/t/misleading-indentation-on-if-statements-with-clang-format-alignafteropenbracket/89069, see https://clang.llvm.org/docs/ClangFormatStyleOptions.html#alignafteropenbracket.
Maybe https://clang.llvm.org/docs/ClangFormatStyleOptions.html#breakafteropenbracketif would work?
There was a problem hiding this comment.
With the clang-format I have on my machine,
ContinuationIndentWidth: 8
AlignAfterOpenBracket: DontAlign
works; I don’t have BreakAfterOpenBracketIf.
But of course it results in unnecessary wide continuation indents elsewhere.
Anyway, another example of why I don’t think clang-formatting everything in ICU is happening soon…
|
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
|
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
Currently, ICU4C rejects the UnicodeSet
[\N{PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET}], and accepts only the misspelling[\N{PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET}]. With this change, both expressions are valid and equivalent to[︘].Name aliases of other types do not work (ICU-8963). UAX44-LM2 loose matching does not work as specified (ICU-3736).
Checklist